Bounds for Regret-Matching Algorithms

نویسندگان

  • Amy Greenwald
  • Zheng Li
  • Casey Marks
چکیده

We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set Φ of transformations over the set of actions. Specifically, we calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to the average reward that could have been obtained had the agent instead played each transformation φ ∈ Φ of its sequence of actions. The regret matching algorithms analyzed here select the agent’s next action based on the vector of Φ-regrets, along with a link function f . Many well-studied learning algorithms are seen to be instances of regret matching. We derive bounds on the regret experienced by (f,Φ)-regret matching algorithms for polynomial and exponential link functions (though we consider polynomial link functions for p > 1 rather than p ≥ 2). Although we do not improve upon the bounds reported in past work (except in special cases), our means of analysis is more general, in part because we do not rely directly on Taylor’s theorem. Hence, we can analyze algorithms based on a larger class of link functions, particularly non-differentiable link functions. In ongoing work, we are indeed studying regret matching with alternative link functions, other than polynomial and exponential.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regret-Matching Bounds Bounds for Regret-Matching Algorithms

We study a general class of learning algorithms, which we call regret-matching algorithms, along with a general framework for analyzing their performance in online (sequential) decision problems (ODPs). In each round of an ODP, an agent chooses a probabilistic action and receives a reward. The particular reward function that applies at any given round is not revealed until after the agent acts....

متن کامل

Fair Algorithms for Infinite Contextual Bandits

We study fairness in infinite linear bandit problems. Starting from the notion of meritocratic fairness introduced in Joseph et al. [9], we expand their notion of fairness for infinite action spaces and provide an algorithm that obtains a sublinear but instance-dependent regret guarantee. We then show that this instance dependence is a necessary cost of our fairness definition with a matching l...

متن کامل

No-regret algorithms for structured prediction problems—DRAFT

No-regret algorithms are a popular class of learning rules which map a sequence of input vectors x1, x2 . . . to a sequence of predictions y1, y2, . . .. Unfortunately, most no-regret algorithms assume that the predictions yt are chosen from a small, discrete set. We consider instead prediction problems where yt has internal structure: yt might be a strategy in a game like poker, or a configura...

متن کامل

Regret bounds for Non Convex Quadratic Losses Online Learning over Reproducing Kernel Hilbert Spaces

We present several online algorithms with dimension-free regret bounds for general nonconvex quadratic losses by viewing them as functions in Reproducing Hilbert Kernel Spaces. In our work we adapt the Online Gradient Descent, Follow the Regularized Leader and the Conditional Gradient method meta algorithms for RKHS spaces and provide regret bounds in this setting. By analyzing them as algorith...

متن کامل

No-regret algorithms for Online Convex Programs

Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning a linear classifier, dynamically rebalancing a binary search tree, finding the shortest path in a graph with unknown edge lengths, solving a structured classification problem, or finding a good strategy in an extensive-form game. Several res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006